Detecting Anomalous Cybersecurity Events Using LSTM Autoencoders

A Case Study on the BETH Dataset

Lionel & Killian (Advisor: Dr. Cohen)

2025-08-05

Introduction

  • Modern infrastructures face sophisticated, process-level intrusions that mimic benign behavior.
  • Static rule-based detection fails to capture temporal dependencies.
  • We investigate LSTM-based anomaly detection for process logs.
  • Key idea: model normal system activity and detect deviations as anomalies.

LSTM Motivation

  • LSTM networks (Hochreiter et al. 1997):
    • Solve vanishing gradient problem with gated memory cells.
    • Capable of learning long-range temporal dependencies.
  • Applications:
    • Speech recognition (Graves et al. 2013).
    • Language modeling (Sundermeyer et al. 2012).
    • Clinical anomaly detection (Lipton et al. 2016).
  • Why relevant ?
    • Cybersecurity logs are sequential — order and timing matter.

Cybersecurity Context

  • Kim et al. 2016: LSTM classifiers for intrusion detection outperform static baselines.

  • Malhotra et al. 2016: LSTM encoder-decoder reconstructs normal sequences, anomalies detected by high reconstruction error.

  • Yin et al. 2017: RNNs generalize better on raw network flows.

  • Cinque et al. 2022: Micro2vec + LSTM captures log anomalies in microservices.

Data Challenge

  • Old datasets (KDD’99, NSL-KDD, ISCX 2012) are synthetic and outdated.

  • BETH dataset (Highnam et al. 2021)(inproceedings?):

    • 8M+ process-level events (eBPF-instrumented honeypots).
    • Rich features: timestamps, syscalls, labels (benign vs. malicious).
    • Realistic temporal and adversarial patterns for modern systems.

Our Approach

  • Semi-supervised LSTM Autoencoder:
    • Train only on benign sequences (normality modeling).
    • Use reconstruction error to flag anomalies.
  • Goal: detect process-level attacks in test logs.

Methods

train_dataset.csv already exists - loading from disk
validation_dataset.csv already exists - loading from disk
test_dataset.csv already exists - loading from disk

Understanding LSTM Networks

  • Recurrent Neural Network (RNN) with memory
  • Keeps track of long sequences
  • Ideal for logs, time series, and system calls

LSTM Fundamentals

  • Forget Gate:
    \(f_t = \sigma(W_f [h_{t-1}, x_t] + b_f)\)

  • Input Gate and Candidate Update:
    \(i_t = \sigma(W_i [h_{t-1}, x_t] + b_i)\)
    \(\tilde{C}_t = \tanh(W_c [h_{t-1}, x_t] + b_c)\)

  • Cell State Update:
    \(C_t = f_t * C_{t-1} + i_t * \tilde{C}_t\)

  • Output Gate and Hidden State:
    \(o_t = \sigma(W_o [h_{t-1}, x_t] + b_o), \;\; h_t = o_t * \tanh(C_t)\)

Figure 1: A module of LSTM network (Trinh et al. 2021)

LSTM Autoencoder for Anomaly Detection

Figure 2: LSTM Autoencoder for Anomaly Detection (Trinh et al. 2021)

Anomaly Detection from Reconstruction Error

\[ L = \frac{1}{T} \sum_{t=1}^T \| x_t - \hat{x}_t \|^2 \]

The BETH Dataset

  • Dataset size: 8,004,918 system events.
  • Training subset: ~763,145 benign events (60/20/20 split).
Feature Description
timestamp Date and time When the event occurred (float)
processId ID of the process generating the event
threadId ID of the thread performing the operation
parentProcessId ID of the parent process
userId User running the process/event
mountNamespace Kernel namespace for filesystem isolation
processName Name of the executable or program
hostName Name or IP of the machine
eventId Numeric identifier for the event
eventName Name/type of system call/event
stackAddresses List of memory addresses (call stack)
argsNum Number of arguments for the event
returnValue Return value of the system call/event
args List of arguments (name, type, value)
sus 1 if flagged suspicious, 0 otherwise
evil 1 if event is malicious, 0 otherwise

Data Preprocessing and Sequence Modeling

Preparing the Data

  • Feature Selection

  • Encoding:
    One-hot encoding for categorical features

  • Sequence Generation: sliding window and Reshaped logs into 3D tensors

  • Sequence Labeling:
    Normal: Only benign events
    Anomalous: At least one suspicious (Sus) or malicious (Evil) event

LSTM Training & Architecture

Trained on normal sequences only

Symmetric autoencoder: stacked LSTM + bottleneck

Optimized via Adam, minimizing reconstruction loss

Generalization: early stopping, dropout, 5-fold CV

Hyperparameters tuned based on prior work (Nguyen et al. 2021; Malhotra et al. 2016).

Threshold Selection and Evaluation Metrics

Threshold selection via:

  • 95th percentile of training errors
  • F1-score optimization (validation set)
  • Joint minimization of FP/FN rates

Evaluation metrics:

  • Precision, Recall, F1-score
  • ROC-AUC, Confusion Matrix
  • Error distribution plots

Data Exploration and Visualization

Event Frequency Distribution.

Process Behavior

Temporal Skew in Data Collection

Label Imbalance and Anomaly Prevalence

Correlation

Correlation

Boxplot of entropy values

Model Architecture

X_train shape: (763137, 8, 9)
X_val shape: (188960, 8, 9)
X_test shape: (188960, 8, 9)
y_test distribution: [ 30528 158432]
Model: "functional"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ input_layer (InputLayer)        │ (None, 8, 9)           │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ lstm (LSTM)                     │ (None, 8, 128)         │        70,656 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout (Dropout)               │ (None, 8, 128)         │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ lstm_1 (LSTM)                   │ (None, 64)             │        49,408 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_1 (Dropout)             │ (None, 64)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ repeat_vector (RepeatVector)    │ (None, 8, 64)          │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ lstm_2 (LSTM)                   │ (None, 8, 64)          │        33,024 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_2 (Dropout)             │ (None, 8, 64)          │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ lstm_3 (LSTM)                   │ (None, 8, 128)         │        98,816 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_3 (Dropout)             │ (None, 8, 128)         │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ time_distributed                │ (None, 8, 9)           │         1,161 │
│ (TimeDistributed)               │                        │               │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 253,065 (988.54 KB)
 Trainable params: 253,065 (988.54 KB)
 Non-trainable params: 0 (0.00 B)

Training and Validation Performance

  • 5-fold cross-validation:

    • Avg. validation loss: 0.1516 ± 0.1443.
  • Stable convergence.

Figure 7: Training / validation Loss result

Anomaly Detection Threshold

Distribution of reconstruction error separates normal vs. anomalous.

Figure 8: Histogram of test reconstruction errors (MAE)

Classification Metrics

Precision: 99.3%, Recall: 99.5%, F1-score: 99.4%.

Figure 9: Precision Recall, F1 vs threshold plot

Confusion Matrix

Figure 10: Confusion matrix for final predictions

ROC Curve

ROC AUC = 99.5%.

Figure 11: ROC Cuve

Conclusion

Key Results
- LSTM autoencoder successfully learned normal process behavior from the BETH dataset.

  • Achieved 99% accuracy and F1-score of 0.95 in detecting malicious sequences.

  • High recall → Most attacks detected (low false negatives).

  • Low false positive rate (3.4%) → Suitable for real-world deployment.

Implications
- Detects novel/zero-day attacks without prior knowledge.

  • Fits real-time monitoring in dynamic/cloud environments.

  • Future work: add attention, ensemble methods, and retraining to improve adaptability.

References

Malhotra, Pankaj, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal. 2016. “LSTM-Based Encoder-Decoder for Multi-Sensor Anomaly Detection.” CoRR abs/1607.00148. http://arxiv.org/abs/1607.00148.
Nguyen, H. D., K. P. Tran, S. Thomassey, and M. Hamad. 2021. “Forecasting and Anomaly Detection Approaches Using LSTM and LSTM Autoencoder Techniques with the Applications in Supply Chain Management.” International Journal of Information Management 57: 102282. https://www.sciencedirect.com/science/article/abs/pii/S026840122031481X.
Trinh, Hoang Duy, Engin Zeydan, Lorenza Giupponi, and Paolo Dini. 2021. “Detecting Mobile Traffic Anomalies Through Physical Control Channel Fingerprinting: A Deep Semi-Supervised Approach.” IEEE Access 9: 82564–80. https://doi.org/10.1109/ACCESS.2021.3087287.